arm64: Replace RSH/RSZ -> CAST nodes with clearing register#121007
arm64: Replace RSH/RSZ -> CAST nodes with clearing register#121007jonathandavies-arm wants to merge 9 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
Please can I have a review? @dotnet/arm64-contrib @EgorBo |
|
cc: @a74nh @JulieLeeMSFT |
|
/azp run Fuzzlyn |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Looks like Fuzzlyn got stuck? |
|
Could someone run fuzzlyn again on this please. Previous one got cancelled by the CI, I think. |
src/coreclr/jit/lower.cpp
Outdated
| if (!cast->isContained() && !cast->IsRegOptional() && !cast->gtOverflow() && | ||
| // Smaller CastOp is most likely an IND(X) node which is lowered to a zero-extend load | ||
| cast->CastOp()->TypeIs(TYP_LONG, TYP_INT)) | ||
| // Try to recognize right shift with a CAST node that is equivilent to mov #0 |
There was a problem hiding this comment.
This optimization should be useful for all platforms. Why restrict it to Arm64?
This would also likely be more beneficial if implemented in morph, where it could enable further downstream optimizations.
There was a problem hiding this comment.
This optimization should be useful for all platforms. Why restrict it to Arm64?
This would also likely be more beneficial if implemented in morph, where it could enable further downstream optimizations.
Agreed. There is nothing fundamentally architecture specific here, just replacing an overflowing shift with zero.
My only concern would be if for some reason the casts weren't being introduced until after all the morph passes. But, I don't think that's going to happen.
There was a problem hiding this comment.
We can always have it in both places, but I do think that morph is the more meaningful location here.
A more comprehensive version of this is #122533, which also handles other optimizations but is also doing it in lowering.
There was a problem hiding this comment.
I checked out the work in #122533 and ran my tests in this PR and they don't pass. I think both of these PRs are doing different optimisations. The Fix section in the other PR doesn't describe the situation I'm trying to optimise.
There was a problem hiding this comment.
I've moved the optimisation into morph.
| GenTreeIntCon* cns = op2->AsIntCon(); | ||
| if (!cast->gtOverflow() && cast->CastOp()->TypeIs(TYP_INT) && varTypeIsUnsigned(cast->CastToType())) | ||
| { | ||
| ssize_t shiftAmount = cns->IconValue(); |
There was a problem hiding this comment.
Can you add a small comment around here saying what the transform does? Just similar to the one on line 11279.
Change lgtm otherwise
There was a problem hiding this comment.
Added a comment.
|
No diffs in SPMI -- do we think this transformation is worth it? |
Hi @jonathandavies-arm , echoing Jakob's question. Does this transform go after a specific scenario you were looking at? |
There isn't a specific scenario we are looking at. This tasks was one of a list of general improvements that we thought we should do. The asmdiff throughput doesn't show any increase in instructions although the arm64 linux has failed to build. https://dev.azure.com/dnceng-public/public/_build/results?buildId=1273270&view=ms.vss-build-web.run-extensions-tab Here is a link to compiler explorer https://godbolt.org/z/jf5455rcb that shows a similar optimisation done by GCC and Clang. I do think this is a useful optimisation but we can close it if you feel it isn't worth it. |
It is not unusual for us to implement optimizations but to end up closing them when we find small or no impact, or even remove existing optimizations in those scenarios. It is not purely about JIT throughput but also about maintaining them in the future and consistency in the optimization phases. We like to see some motivating factors, like asm diffs improvements or simplifying upcoming changes. With that in mind I will close this, but feel free to reopen if you have any other motivation for the change. |
In lowering change a down cast and right shift into a mov w0, wzr if the shift amount is constant and greater & equal to the size of the downcast type. e.g.
assembly changes from
to